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The  Use  of  Configurations  of  Purchase  Likelihoods 
to  Predict  Auto  Purchases 

ABSTRACT 


Using  consumer  panel  data,  this  paper  explores  the.  extent  to  which 
automobile  purchases  can  be  predicted  by  functions  containing  purchase 
likelihoods  for  automobiles  and  for  other  durable  goods  both  in  the  very 
recent  past,  and  going  back  further.   Despite  low  goodness-of-f  i  t ,  pre- 
dictions of  auto  purchases  are  found  to  be  quite  high  when  a  logit  model 
is  employed. 


The  Usa  of  Configurations  of  Purchase  Likelihoods 
to  Predict  Auto  Purchases 

Ellen  Liebraan  and  Robert  Ferber 

Ir.trcduction 

The  work  reported  here  is  an  outgrowth  of  a  previous  empirical  explora- 
tion using  data  from  the  first  too  waves  of  the  Illinois-Berkeley  panel 
of  young  married  couples  in  Peoria  and  Decatur,  Illinois.   These  couples 
had  been  married  in  the  summer  of  1968,  and  the  husband  was  30  years  of  age 
or  less  at  the  time.   The  couples  (313  at  the  start)  were  interviewed  that 
fall  and  approximately  every  six  months  thereafter. 

The  earlier  analysis  had  both  substantive  and  methodological  concerns. 
Cur  substantive  concern  was  with  the  investigation  of  whether  the  configura- 
tion of  reported  likelihood  to  buy  13  different  durable  goods  at  one  time 
showed  any  relationship  to  the  purchase  of  one  of  those  durables,  automobiles, 
six  months  later.   Likelihoods  were  obtained  as  subjective  probabilities 
cr>.  a  scale  from  0  to  100.   The  focus  was  on  auto  purchases  in  view  of 
their  magnitude  and  of  their  high  frequency  of  purchase. 

Implicit  was  the  hope  that  the  likelihood  variables  would  absorb  enough  of 
the  complex  of  factors  involved  in  deciding  to  make  a  purchase,  that  they  and 
the  constraints  arising  from  their  interactions  would  yield  enough  information 
co  predict  purchases.   This  assumption  clearly  must  vary  in  its  validity  with  the 
amcur.t  of  time  covered  by  the  likelihood  assessment.   Thus,  assessments 
race  co;ay  will  function  better  as  indicators  of  the  factors  relevant  to 
what  -.-.-ili  be  done  tomorrow  than  they  will  for  what  will  be  done  next  month, 
etc.   Since  the  likelihood  questions  covered  a  horizon  of  six  months,  and 
scrr.etim.es  more,  it  is  not  clear  how  much  predictive  power  it  is  reasonable 
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to  attribute  to  such  variables  under  these  circumstances,  nor  whether 
failure  should  indict  likelihoods  in  general  or  just  ones  covering  long 
periods  of  time. 

The  approach  in  this  earlier  study  was  to  try  to  predict  purchases  of 
sutos  reported  on  Wave  2  as  a  linear  arithmetic  function  of  13  purchase 
likehihoods  (and  various  socioeconomic  variables)  obtained  on  the  first  wave, 
The  13  purchase  likelihoods  referred  to  automobiles  and  to  12  other  major 
durables,  which  are  identified  in  the  stub  of  Table  1. 

The  methodological  issue  arose  when  two  methods  of  making  these 
predictions  were  compared  —  that  based  on  the  regression  model  and  that 
based  on  the  logit  model.     The  regression  model  is  much  simpler  but  its 
basic  assumptions  are  contradicted  by  the  use  of  a  0-1  dependent  variable, 
which  clearly  does  not  satisfy  the  assumptions  of  normality.   Further,  it 
does  not  restrict  the  estimates  to  the  allowable  range  for  the  dependent 
variable . 

The  logit  model,  on  the  other  hand,  is  designed  for  precisely  such  a 
situation  by  expressing  the  probability  of  purchase  as  a  simple  function  of 
the  independent  variables.   Thus,  instead  of 

13 
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where  B   is  0  or  1  as  the  auto  is  purchased  or  not,  L.  are  the  likelihoods, 
a.  their  coefficients,  and   c  a  constant,  we  have: 
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where  P   is  the  probability  of  purchasing  an  auto,  the  L.  are  as  before, 
and  the  primed  terms  are  the  same  as  before,  but  marked  in  this  way  to 
indicate  that  they  are  the  results  of  a  different  estimation  process. 

The  results  of  these  computations  suggested  that  at  least  for  the  second 
wave  both  auto  likelihood  and  some  of  the  ether  12  likelihoods  made 
significant  contributions  to  the  prediction  of  auto  purchases  six  months 
later.   Income  also  was  highly  significant  but  none  of  the  other  socio- 
economic variables.   Hence,  with  respect  to  our  substantive  question  we  had 
some  support  that  configurations  of  likelihoods  helped  improve  predictions 
of  auto  purchases.   In  addition,  comparison  of  the  two  methods  of  fit  indicated 
that  the  logistic  technique  did  somewhat  better  in  terms  of  goodness  of  fit 
and  significance  of  parameter  estimates. 

Present  Objective 

The  present  paper  represents  an  attempt  to  determine  by  purely  statistical 
methods  the  extent  to  which  the  preceding  results  remain  valid  when  applied 
to  the  first  nine  waves  of  data  for  this  panel,  that  is,  covering  a  five- 
year  period.   The  principal  question  to  be  answered  on  the  substantive  side 
is  whether  the  significance  of  the  configuration  of  purchase  likelihoods 
relative  to  later  auto  purchases  stood  up  through  time.   In  other  words,  do 
the  purchase  likelihoods  for  these  other  products  help  to  predict  auto 
purchases  over  and  above  the  information  contained  in  the  purchase  likelihood 
for  autos  alone? 

It  was  also  of  interest  to  ascertain  whether  these  likelihoods  retained 
significance  after  inclusion  of  the  principal  socioeconomic  variables  that 
seemed  to  influence  auto  purchases  in  these  panel  data,  namely,  income  level, 
number  of  children  and  whether  the  wife  was  working. 
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From  a  methodological  point  of  view,  we  were  interested  in  exploring 
further  the  relative  merits  of  regression  analysis  and  logit  analysis. 
'.Thils  logit  analysis  is  theoretically  more  defensible,  if  essentially  the 
sarr.e  results  are  obtained  by  the  simpler  regression  analysis  a  case  could 
he  rr.ade  for  its  use  despite  its  theoretical  deficiencies. 

The  plan  of  this  exploratory  study  was  to  test  four  different  approaches 
cn  these  data.   First,  as  a  start  and  as  a  basis  for  later  comparison,  auto 
purchases  on  Waves  2  through  9  were  expressed  as  a  simple  function  of  auto 
likelihoods  on  the  preceding  wave,  and  the  parameters  estimated  by  both 
regression  and  logit  analysis. 

Second,  the  configuration  of  likelihoods  was  introduced  by  expressing 
auto  purchases  on  one  wave  as  a  function  of  all  13  purchase  likelihoods 
cr.  the  preceding  wave  as  independent  variables.   In  Waves  6,  8  and  9, 
separate  likelihoods  were  available  for  both  the  husband  and  the  wife. 
Since  there  was  no  clear  basis  for  using  one  set  rather  than  the  other, 
separate  functions  were  fitted  using  each  set.  Again  the  parameters  were 
estimated  both  by  regression  and  by  logit  analysis. 

Third,,  an  attempt  was  raade  to  explore  the  effect  of  only  past  purchases 
of  automobiles  and  auto  purchase  likelihoods  on  actual  purchases  at  a  later 
time.   This  was  done  for  Wave  9,  using  as  independent  variables  auto  purchases 
cr.  each  of  the  preceding  seven-  waves  and  also  auto  likelihood  on  each  of  the 
preceding  eight  waves.   Once  more,  the  parameters  were  estimated  both  by 
regression  analysis  and  by  logit  analysis. 

Fourth,  the  three  socioeconomic  variables  that  seemed  especially  important 
freer,  the  previous  study  were  added  to  the  functions  tested  in  the  third  step. 
These  variables  were  income,  number  of  children,  and  whether  or  not  the  wife 
was  working. 
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Substantive  Results 

Estimates  of  the  parameters  of  the  logit  function  for  Waves  2  through  8 
for  the  prediction  of  auto  purchase  in  Wave  t  given  the  purchase  likelihoods 
en  the  preceding  wave  are  given  in  Table  1.   In  addition  to  the  usual 
measures  of  significance  of  the  coefficients,  two  overall  measures  of 
goodness  of  fit  are  provided.   One  measure  is  a  goodness-of-fit  statistic 

associated  with  the  logit  method,  the  Likelihood  Ratio  Index  (LRI) .   The 

2 
value  of  LRi,  like  the  value  of  R  ,  ranges  between  0  and  1.    However,  no 

clear  algebraic  relationship  exists  between  these  two  statistics  and  the 

2 

values  obtained  are  not  comparable.  Moreover,  LRI,  unlike  R  ,  has  no  known 

distribution  properties.   Its  major  use  lies  in  making  comparisons  among  logit 
functions . 

The  second  overall  measure  of  fit  presented  in  Table  1,  the  Likelihood 
Ratio  (LR) statistic  does  have  distributional  properties,  being  a  chi-squared 
statistic,  but  is  unbounded  in  range.   However,  because  of  its  distributional 
properties,  significance  estimates  can  be  attached  to  this  statistic,  as  is 
done  in  the  table. 

As  is  evident  from  Table  1,  it  appears  that  when  we  consider  the  con- 
figuration of  likelihoods,  as  a  rule  the  auto  likelihood  is  the  only  one  to 
be  consistently  statistically  significant;  the  others  crop  up  infrequently 
and  without  apparent  pattern.   The  overall  fit  is  statistically  significant 
in  r.cst  instances,  but  this  is  almost  entirely  due  to  the  auto  likelihood 
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variable.   The  addition  of  purchase  variables  from  the  preceding  wave 
did  not  product  statistically  significant  results  and  are  therefore  not 
shown  here. 

The  second  approach,  relating  purchases  to  all  previous  purchases  and 
all  previous  auto  likelihoods,  indicates  a  somewhat  erratic  pattern  (Table  2), 
the  most  important  variable  being  auto  purchase  likelihood  on  the  preceding 
•.•rave.   The  signs  and  significant  variables  suggest  a  possible  three-year 
buying  cycle  in  cars,  but  the  net  contribution  of  these  variables  to  the 
explanation  of  purchases  in  Wave  9  is  relatively  small.  Moreover,  the 
addition  of  the  three  socioeconomic  variables,  shown  in  the  second  column 
of  coefficients  in  Table  2,  contributes  somewhat  more  to  the  goodness  of 
fit,  judging  by  the  increased  value  of  the  likelihood  ratio  statistic. 
Indeed,  comparison  of  the  likelihood  ratio  statistics  for  various  combinations 
of  variables  shows  that  these  three  variables  (particularly  number  of 
children)  do  make  a  statistically  significant  contribution  to  the  goodness 
of  fit,  as  does  the  set  of  past  purchase  variables. 

From  this,  we  can  draw  two  sets  of  conclusions.   First,  our  answers 
to  whether  configurations  of  likelihoods  are  important  and  whether  they 
form  patterns  through  time  are  both  in  the  negative.   V7e  cannot  conclude, 
as  before,  that  the  whole  configuration  is  important.   In  fact,  only  auto 
likelihoods  seem  relevant.   Second,  the  configuration  of  past  purchases 
seems  to  have  an  effect  on  present  ones,  the  immediately  previous  purchase 

by  itself  does  not,  but  purchases  two  or  threa  years  earlier  do  seem 
to  influence  current  our chases. 
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2.   LOGIT  COEFFICIENTS  ON  PROBABILITY  OF  BUYING  AUTO  ON  WAVE  9  AS  A  FUNCTION 
OF  PREVIOUS  FL~ 'CHASES,  PREVIOUS  LIKELIHOODS,  AND  OTHER  VARIABLES 

<N=129) 


SvrJbol  Variable 


Estimated  coefficient 


B_ 

B_ 
o 

B7 

B8 

™1 


LA 
LA. 


Buying  -  wave  2 
Buying  -  vave  3 
Buying  -  wave  4 
Buying  -  vave  5 
■  Buying  —  "wave  6 
Buying  —  vave  7 
Buying  -  -wave  8 
Auto  likelihood  -  wave  1 
Auto  likelihood  -  wave  2 
Auto  likelihood  -  wave  3 
Auto  likelihood  -  wave  4 
Auto  likelihood  -  wave  5 
Auto  likelihood  -  wave  5 
Auto  likelihood  -  wave.  6 
Auto  likelihood  —  wave  7 
Auto  likelihood  -  wave  7 
Auto  likelihood  -  wave  8 
Auto  likelihood  -  wave  8 
Income  1969  • 
KIDS  73  No.  -  children  1973 
KWORK  73  Wife  working  1973 
C  -  Cons tent 


LA 


LA- 


LA 
LA. 


2- 

3 

4 
5H 
5W 
L6 

"7H 
7W 
8H 


LA. 
8W 

IN69 


.011 

.425 

.503a 

-.656a 

-.238 

.227 

-.412 

-.509 

.694a 
-.018 
-.594a 

.313 

-.978b 

-.137 

.393 

.232 

.230 

1.218C 


-.602 


.034 
.561 
.639b 
-.942h 

-.308 
.349 

-.454 

-.895* 

1.100a 
.224 
,665a 
.203 
-1.167fc 

-.361 
.653 
.220 
.291 
•  1.487* 
.522 

-.644* 
.100 

-.774 


LRI 

LR  Statistic 


60.8 


34 
d 


.40 
d 


70.7 


Significant  ?.t  .10  level. 


Significant 


05  level. 


Significant  At  .01  level. 

d  . 

Significant  at  .001  level. 
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logit  versus  Regression 


Regression  results  corresponding  to  the  logit  functions  shown  in 
"able  1  yield  a  very  similar  pattern  of  significance,  as  is  shown  in 
Table  3.   As  before,  the  auto  likelihood  variable  for  the  preceding  wave 
is  almost  invariably  statistically  significant  at  the  .05  level  or  beyond, 
while  the  other  purchase  likelihoods  are  statistically  significant  only 
occasionally  and  not  with  any  particular  pattern.   Moreover,  the  net 
contribution  of  these  other  variables  to  the  auto  likelihood  variable 
is  not  statistically  significant. 

The  coefficients  estimated  by  the  two  procedures  (exclusive  of 
constants)  have  practically  all  the  same  signs.   They  differ  only  in  several 
cases  which  are  not  statistically  significant.  A  few  differences  do  arise 
with  respect  to  significance.   In  Tables  1  and  3,  the  logit  and  regression 
results  have  16  variables  in  common  that  are  significant,  and  each  one 
contributes  four  variables  that  are  significant  and  not  matched  by  the  other. 

Ir.  a  similar  manner,  the  results  from  the  regression  analysis  using 
".."ave  3  purchases  as  a  function  of  previous  purchases,  previous  auto  likelihoods 
ar.c  other  variables  are  very  similar  to  the  logit  results  shown  in  Table  2. 
The  same  past-purchase  and  purchase-likelihood  variables  are  significant  in 
ission  equations  (Table  4)  as  in  the  logit  equation.   The  principal 
is  that  with  the  regression  function  including  the  three 
socioeconomic  variables,  the  income  variable  is  statistically  significant  at 
the  .10  level  while  the  variable  for  the  number  of  children  is  not  statistically 
significant.   For  both  types  of  methods,  however,  the  addition  of  the  socio- 
eccr.c~.ic  variables  produces  a  moderate  increase  in  the  goodness  of  fit. 
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4.   REGRESSION  C?  AUTO  PURCHASE  CM  WAVE  9  ON  PREVIOUS  AUTO  PURCHASES,  PREVIOUS 
AUTO  PURCHASES,  PREVIOUS  AUTO  LIKELIHOODS,  AND  OTHER  VARIABLES 


Svmbol  Variable 


Estimated  coefficient 


2 


B 


B 


.- 


M 


B 


B 


B 


Buying  wave  2 
Buying  wave  3 
Buying  wave  4 
Buying  wave  5 
Buying  wave  6 
Buying  wave  7 
Buying  wave  8 
Auto  likelihood  wave  1 
Auto  likelihood  wave  2 
Auto  likelihood  wave  3 
Auto  likelihood  wave  4 
Auto  likelihood  wave  5 
Auto  likelihood  wave  5 
Auto  likelihood  wave  6 
Auto  likelihood  wave  7 
Auto  likelihood  wave  7 
Auto  likelihood  wave  8 
Auto  likelihood  wave  8 
Income  1989 
No.  Children  1973 

WWork73  Wife  Working  1973 

C        Constant 


8 
LA 

LA, 

LA. 
LA, 
LA 


LA 


5H 
5v* 


^7* 

IN69 

KTOS73 


.003 
.052 
.085 
-t085£ 
-.021 
.035 
"-.065 
-.077£ 
.099£ 
-.016 
-.076 
.044 
-.150* 
-.018 
.039 
.043 
.045 
.168C 


a 


-.0003 

.050 

b 

.-092 

-.086a 

-.024 

.043 

-.067 


-.096 
.117* 
-.002 
-;077 

-034 

v 
-.151 

--046 

.062 

.034 

.054 

.181' 


b 


d 


.395 


.074 

-.064 

.016 

.395C 


R 


adj 


"Significant  at    .10    level. 
Significant  at    .05   level. 


,35 
.24 


.39 
.27 


'Significant  at    .001   level. 
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We  do  find  an  almost  universal  tendency  for  the  LRI  to  be  greater  than 
",  both  for  Tables  1  and  3,  and  for  Tables  2  and  4.   It  is  not  clear  what, 
if  anything,  this  shows.   More  important  is  that  the  LR  statistic  both 
shows  slightly  higher  levels  of  signif  i  \    ice  and  is  significant  more  often 
than  R  .   This  suggests  a  better  fit  from  the  logit  method,  although  how 
zaich  is  not  clear. 

To  obtain  an  idea  of  the  relative  prediction  abilities  of  these  two 
approaches,  each  of  the  functions  was  used  to  derive  an  error  classifica- 
tion aatrix,  that  is,  to  actually  "predict"  the  auto  purchase  of  each  house- 
hold so  that  the  accuracy  of  classification  of  each  function  could  be  assessed. 
Since  this  test  is  applied  in  the  present  case  to  the  same  observations  from 
which  the  parameters  were  estimated,  the  possibility  of  search  bias  is  quite  high, 
and  it  is  a  pity  that  not  enough  observations  were  available  to  enable  the 
sarroie  to  be  split  into  separate  analysis  and  validation  samples.   Neverthe- 
less, these  results  should  provide  some  indication  of  the  extent  if  any 
to  which  these  two  methods  differ. 

The  results  of  these  computations  are  shown  in  Table  5.   Rather  sur- 
prisingly, they  show  that,  despite  the  apparent  similarity  of  the  two  sets 
of  results  in  terms  of  significant  coefficients  and  goodness-of-f it ,  the 
logit  model  has  an  accuracy  of  classification  either  approximately  equal  to, 
or  far  superior  than,  the  regression  model.   This  is  especially  true  for  the 
*ave  9  functions  that  combine  previous  auto  likelihoods  with  previous  auto 
purchases . 

Equally  interesting  is  the  fact  that,  contrary  to  the  results  obtained 
with  the  goodness-of-f it  measure,  higher  accuracy  of  classification  is 
obtained  for  some  of  the  earlier  wave  functions  using  only  the  13  purchase 
likelihoods  or  only  sequences  of  previous  auto  purchase  likelihoods  than  the 
-ore  complex  Wave  9  functions.   This  is  true  both  of  the  logit  model  and  of 


PERCENT  OF  AUTO  PURCHASE  REPORTS  CLASSIFIED  CORRECTLY  BY  ALTERNATIVE  LOGIT 
AND  LINEAR  REGRESSION"  MODELS,  BY  WAVE 


Model 


Wave* 


Log  it 


Regression 


All  13  likelihoods 


2 

3 

4 

5 

6H 

6W 

7 

8H 

8W 

9H 

9W 


82.9% 

79.8 

82.2 

76.0 

63.6 

65.9 

65.9 

85.3 

82.2 

73.6 

72.1 


78.2% 

76.7 

82.9 

69.0 

41.7 

45.0 

34.9 

79.0 

77.5  • 

55.8 

55.0 


Previous  auto  likelihoods 


2 

3 

4 

5 

6H 

6W 

7 

8H 

8W 

9H 

9W 


78.3% 

79.1 

82.2 

74.4 

53.5 

55.8 

59. 

80. 

79.8 

68.2 

70.5 


7 
.6 


51.9% 

79-8 

82.9 

75.2 

41.1 

56.6 

30.2 

81.4 

81.4 

58.1 

56.6 


Previous  auto  likelihoods 
and  auto  purchases 


74.4% 


52.7% 


Previous  auto  likelihoods 
and  auto  purchases  and 
socioeconomic  variables 


78.3% 


51.9% 


The  H's  and  W's  refer  to  which  of  the  two  sets  of  likelihoods  (i.e.  husband's 
or  wife's)  available  on  the  wave  were  used  as  independent  variables. 
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the  regression  model.   However,  for  Wave  9,  which  had  the  most  combinations 
tested,  the  accuracy  of  classification  of  the  logit  model  (but  not  of  the 
regression  model)  is  highest  when  auto  likelihood  and  auto  purchase  are 
combined  with  the  socio-economic  variables.   Especially  interesting  is  the 
fact  that  the  actual  percent  purchasing  autos  on  Wave  9  was  40%,  so  that  the 
logit  results  but  not  the  regression  results  are  far  better  than  what  would 
be  obtained  by  a  naive  model  estimate. 

These  results  should  be  treated  with  caution  since  they  are  obtained 
from  the  same  observations  used  to  estimate  the  parameters  of  the  models. 
Nevertheless,  they  support  the  finding  obtained  many  times  in  the  past  of 
the  unreliability  of  goodness-of-fit  as  a  measure  of  predictive  accuracy. 
Moreover,  they  leave  the  strong  supposition  that  the  logit  model  is  no 
worse  than  the  regression  model  and  may  be  considerably  better.   Indeed, 
if  the  logit  model  is  as  accurate  in  classifying  other  types  of  observations 
as  it  is  for  these  data,  it  would  seem  to  provide  a  very  useful  forecasting 
tool  both  for  micro  and  macro  purposes . 


